An In-Depth Analysis of Optimization-Guided Machine Learning Techniques for Robust Breast Cancer Classification

Authors: Aryan Shrivastava, Neha Shrivas, Vivek Shukla

DOI Link: https://doi.org/10.22214/ijraset.2026.81276

Abstract

One of the most widespread and dangerous malignancies in women around the world, breast cancer is a serious challenge to the overall health of a country as it turns out to be a common issue and the cause of death in many individuals among other disadvantages. Early/ accurate diagnosis is very crucial in demonstrating better survival rates, but the conventional methods of diagnosis are time consuming, subjective, and relied on clinical experience. Machine learning (ML) methods have become the potent approach to automated classification of breast cancer within recent years. Nonetheless, traditional ML methods are often associated with overfitting, high dimensional feature space, less than optimal hyperparameter optimization, and poor cross dataset generalizability. In order to overcome such difficulties, this paper presents an optimization-directed machine learning system that would help to increase classification resilience and prediction accuracy. The study uses the Wisconsin Breast Cancer Data set which is a standard benchmark dataset formed by diagnostic features of digitized images of fine needle aspirate tests. There are several optimization algorithms used to select features and to optimize the hyperparameters such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). Incorporated into these metaheuristic methods are the classifiers which include Support Vector Machine and Random Forest to form hybrid prediction models. Experimental findings indicate significant changes in the accuracy, sensitivity and specificity with optimal models as opposed to non-optimized base-line models. The optimization models are better in feature reduction and at the same time, result in high diagnostic accuracy hence reducing the computational complexity and improving robustness. The results underscore the usefulness of optimization-driven ML systems in clinical decision support and provide trustworthy, scalable, and effective means of detecting breast cancer at an early phase.

Introduction

The literature review shows that common classifiers like SVM, Random Forest, KNN, ANN, and CNNs are widely used for breast cancer detection, with ensemble and deep learning models generally performing better. To improve these models, optimization algorithms such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), and Differential Evolution (DE) are used for feature selection and hyperparameter tuning. Hybrid approaches (e.g., GA-SVM, PSO-RF) consistently outperform standalone models by improving generalization and reducing overfitting.

The methodology uses the Wisconsin Breast Cancer Dataset, which contains 569 samples with 30 diagnostic features. Data preprocessing includes normalization, handling missing values, outlier removal, and cross-validation. A wrapper-based optimization approach selects the most relevant features using a fitness function that balances accuracy and feature reduction. Multiple classifiers (SVM, RF, KNN, ANN, XGBoost) are tested with optimization algorithms integrated into a hybrid framework.

Evaluation uses metrics such as accuracy, precision, recall, F1-score, specificity, ROC-AUC, and MCC. Results show that baseline models already perform well, with XGBoost and Random Forest leading. However, optimized hybrid models significantly outperform them, achieving higher accuracy (up to ~98.7%) and better sensitivity and specificity while reducing feature size from 30 to about 14–18 features.

Conclusion

This paper showed that machine learning models optimized using optimization can greatly contribute to the power and predictability of systems that classify breast cancer. The proposed hybrid framework was able to select the dimensions diminishing the dimensionality and enhancing the accuracy, sensitivity, specificity, and generalization stability, by a combination of metaheuristic algorithms that were selected in the proposed solution to the non-excepted hyperparameter selection and dimensionality reduction. The statistical validation of statistical validation showed that optimized models always performed better than baseline classifiers when cross-validation folds were used. The PSO-RF hybrid model was the most effective between the compared strategies, as it offered good ensemble learning and effective convergence of the parameters. These results are associated with the latest studies that show that optimization-improved classifiers lead to better diagnostic accuracy in medical data (Li et al., 2023; Alzubaidi et al., 2021). This framework can be expanded by future work, initiating the deep learning architectures with sophisticated metaheuristic optimization methods. The convolutional neural networks can be further hybridized with swarm-based or evolutionary algorithms and might be further improved in learning features, especially in imaging-based diagnosis of breast cancer (Mirjalili, 2020). Also, real-time clinical deployment will have to be coupled with hospital information systems and validated on large and multi-center datasets to make sure that they can be performed in real-life settings. One of the other innovations that may lead to future success is the introduction of Explainable AI (XAI) methods to introduce a more transparent and trusted approach by clinicians. Interpretable feature attribution approaches like SHAP and LIME may be used to answer the issue of black-box decision-making in healthcare AI (Topol, 2020). With the integration of optimization, deep learning, and explainability, future systems will be able to create more diagnostic accuracy and still be clinically accountable and ethically reliable.

References

[1] Abdel-Basset, M., Mohamed, R., Chakrabortty, R. K., & Ryan, M. (2021). A hybrid COVID-19 detection model using an improved marine predators algorithm and a ranking-based diversity reduction strategy. IEEE Access, 9, 79521–79540. [2] Albahli, S. (2021). Efficient GAN-based chest radiographs (CXR) augmentation to diagnose coronavirus disease pneumonia. International Journal of Medical Informatics, 143, 104284. [3] Alzubaidi, L., Zhang, J., Humaidi, A. J., et al. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8, 53. [4] Arnold, M., Morgan, E., Rumgay, H., et al. (2022). Current and future burden of breast cancer: Global statistics for 2020 and 2040. The Lancet Oncology, 23(2), 155–164. [5] Khan, S., Islam, N., Jan, Z., et al. (2021). A novel deep learning-based framework for breast cancer detection using ultrasound images. Diagnostics, 11(7), 1226. [6] Li, X., Zhang, S., & Chen, H. (2023). Swarm intelligence optimization for medical diagnosis classification: A comprehensive review. Expert Systems with Applications, 213, 119072. [7] McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577, 89–94. [8] Mirjalili, S. (2020). Evolutionary algorithms and neural networks. Studies in Computational Intelligence, 780, Springer. [9] Mirjalili, S., & Lewis, A. (2020). The Grey Wolf Optimizer. Advances in Engineering Software, 95, 51–67. [10] Sung, H., Ferlay, J., Siegel, R. L., et al. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide. CA: A Cancer Journal for Clinicians, 71(3), 209–249. [11] Topol, E. (2020). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44–56. [12] Lundberg, S. M., Erion, G., Chen, H., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2, 56–67. [13] Ribeiro, M. T., Singh, S., & Guestrin, C. (2020). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of KDD, 1135–1144. [14] Yang, X. S., & He, X. (2020). Nature-inspired optimization algorithms in engineering. Journal of Computational Design and Engineering, 7(3), 231–240. [15] Zhang, Y., et al. (2021). Hybrid machine learning approaches for breast cancer diagnosis. Biomedical Signal Processing and Control, 68, 102607. [16] Hassan, M., et al. (2022). A comprehensive review of feature selection methods in medical diagnosis. Artificial Intelligence in Medicine, 123, 102210. [17] Chen, T., & Guestrin, C. (2020). XGBoost: A scalable tree boosting system. ACM SIGKDD Explorations, 6(2), 785–794. [18] Esteva, A., et al. (2021). Deep learning-enabled medical computer vision. NPJ Digital Medicine, 4, 5. [19] Aggarwal, C. C. (2021). Neural networks and deep learning. Springer. [20] World Health Organization (WHO). (2023). Breast cancer fact sheet. Geneva: WHO.

Copyright

Copyright © 2026 Aryan Shrivastava, Neha Shrivas, Vivek Shukla. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET81276

Publish Date : 2026-04-27

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here